Big-ETL: Extracting-Transforming-Loading Approach for Big Data
نویسندگان
چکیده
ETL process (Extracting-Transforming-Loading) is responsible for (E)xtracting data from heterogeneous sources, (T)ransforming and finally (L)oading them into a data warehouse (DW). Nowadays, Internet and Web 2.0 are generating data at an increasing rate, and therefore put the information systems (IS) face to the challenge of big data. Data integration systems and ETL, in particular, should be revisited and adapted and the well-known solution is based on the data distribution and the parallel/distributed processing. Among all the dimensions defining the complexity of the big data, we focus in this paper on its excessive "volume" in order to ensure good performance for ETL processes. In this context, we propose an original approach called Big-ETL (ETL Approach for Big Data) in which we define ETL functionalities that can be run easily on a cluster of computers with MapReduce (MR) paradigm. Big-ETL allows, thereby, parallelizing/distributing ETL at two levels: (i) the ETL process level (coarse granularity level), and (ii) the functionality level (fine level); this allows improving further the ETL performance.
منابع مشابه
PF-ETL : vers l'intégration de données massives dans les fonctionnalités d'ETL
ETL process (Extracting, Transforming, Loading) is responsible for extracting data from heterogeneous sources, transforming and finally loading them into a data warehouse. New technologies, particularly Internet and Web 2.0, generating data at an increasing rate, put the information systems (IS) face to the challenge of Big Data. These data are characterized by, in addition to their excessive s...
متن کاملEfficient ETL+Q for Automatic Scalability in Big or Small Data Scenarios
In this paper, we investigate the problem of providing scalability to data Extraction, Transformation, Load and Querying (ETL+Q) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically. Parallel architectures and mechanisms are able to optimize the ETL process by speedingup each part of the pipeline process as mor...
متن کاملNear-real-time Parallel Etl+q for Automatic Scalability in Bigdata
In this paper we investigate the problem of providing scalability to near-real-time ETL+Q (Extract, transform, load and querying) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically during small fixed time windows. We propose an approach to enable the automatic scalability and freshness of any data warehouse a...
متن کاملBig Data and Specific Analysis Methods for Insurance Fraud Detection
Analytics is the future of big data because only transforming data into information gives them value and can turn data in business in competitive advantage. Large data volumes, their variety and the increasing speed their growth, stretch the boundaries of traditional data warehouses and ETL tools. This paper investigates the benefits of Big Data technology and main methods of analysis that can ...
متن کاملبهبود فرآیند استخراج، تبدیل و بارگذاری در پایگاه داده تحلیلی با کمک پردازش موازی
Abstract Data Warehouses are used to store data in a structure that facilitates data analysis. The process of Extracting, Transforming, and Loading (ETL) covers the process of retrieving required data from the source system and loading them to the data warehouse. Although the structure of source data (e.g. ER model) and DW (e.g. star schema) are usually specified, there is a clear lack of a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015